pl condition
- North America > United States > California (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- (6 more...)
On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen
Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings.
- North America > United States > California (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (10 more...)
A Generalized Alternating Method for Bilevel
Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (10 more...)
our technical contributions and the importance of the work
We thank the reviewers for their constructive comments. We provide the first convergence result for alternating GDA for more general problems. Unlike simultaneous GDA, our alternating GDA uses different learning rates for the primal and dual variables. PL function and introducing a balancing parameter to establish the contraction. Significant advance has been recognized by utilizing the PL conditions in these field.
- Asia > China (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Iowa > Johnson County > Iowa City (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Bellevue (0.04)
- (2 more...)
- Asia > China (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Asia > Middle East > Jordan (0.04)